Skip to content

[Quantization] - Added uses_meta_device_weights to quant config#34645

Merged
vllm-bot merged 7 commits intovllm-project:mainfrom
Josephasafg:add_meta_weights_check
Feb 18, 2026
Merged

[Quantization] - Added uses_meta_device_weights to quant config#34645
vllm-bot merged 7 commits intovllm-project:mainfrom
Josephasafg:add_meta_weights_check

Conversation

@Josephasafg
Copy link
Copy Markdown
Contributor

@Josephasafg Josephasafg commented Feb 16, 2026

Purpose

As more quant methods are starting to support online quantization we need a more robust way to check that they are loading dummy weights in the same way using process_weights_after_loading

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Josephasafg <ajgard7@gmail.com>
Signed-off-by: Josephasafg <ajgard7@gmail.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a uses_meta_device_weights method to the quantization configuration, providing a more robust way to handle online quantization. The change refactors a hardcoded check for fp8 quantization to a more generic mechanism. The implementation is sound, but there's a critical issue where a None value for model_config.quantization could cause a crash. I've provided a suggestion to handle this case gracefully.

Josephasafg and others added 2 commits February 17, 2026 14:11
Signed-off-by: Josephasafg <ajgard7@gmail.com>
@Josephasafg
Copy link
Copy Markdown
Contributor Author

Josephasafg commented Feb 17, 2026

@vkuzo Thanks for the review!

Who should trigger the CI?

Signed-off-by: Josephasafg <ajgard7@gmail.com>
@Josephasafg Josephasafg requested a review from mgoin February 17, 2026 20:46
@mgoin
Copy link
Copy Markdown
Member

mgoin commented Feb 17, 2026

@Josephasafg @vkuzo I would prefer to keep the information on the linear method itself rather than the top-level quant config. What do you think about this proposal

  1. Add uses_meta_device: bool = False to QuantizeMethodBase in base_config.py
  2. Set uses_meta_device = True on Fp8OnlineLinearMethod and Fp8OnlineMoEMethod (the methods that actually create weights on device="meta")
  3. In initialize_dummy_weights, instead of checking the quant config class, iterate over model.modules() and check if any module has a quant_method with uses_meta_device = True

Something like this:

def initialize_dummy_weights(model, model_config, ...):
    meta_device_params: set[int] = set()
    for module in model.modules():
        qm = getattr(module, "quant_method", None)
        if qm is not None and getattr(qm, "uses_meta_device", False):
            for param in module.parameters(recurse=False):
                meta_device_params.add(id(param))

    for param in model.state_dict().values():
        if id(param) in meta_device_params \
                and param.device == torch.device("meta"):
            continue
        initialize_single_dummy_weight(param, low, high, seed)

Signed-off-by: Josephasafg <ajgard7@gmail.com>
@Josephasafg
Copy link
Copy Markdown
Contributor Author

@mgoin @vkuzo I made the change but made it a little simpler. How does this look?

def initialize_dummy_weights(
    model: torch.nn.Module,
    model_config: ModelConfig,
    low: float = -1e-3,
    high: float = 1e-3,
    seed: int = 1234,
) -> None:
    def uses_meta_device(module: torch.nn.Module) -> bool:
        quant_method = getattr(module, "quant_method", None)
        return getattr(quant_method, "uses_meta_device", False)

    has_online_quant = any(uses_meta_device(m) for m in model.modules())

    for param in model.state_dict().values():
        if has_online_quant and param.device == torch.device("meta"):
            # For online quantization, weights are created on meta device and
            # dummy weight init will happen in `process_weights_after_loading`.
            continue

        initialize_single_dummy_weight(param, low, high, seed)

Copy link
Copy Markdown
Member

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, I'm quite happy with this!

@mgoin mgoin added ready ONLY add when PR is ready to merge/full CI is needed quantization labels Feb 18, 2026
@mgoin mgoin enabled auto-merge (squash) February 18, 2026 01:01
@vllm-bot vllm-bot merged commit 1faa8cb into vllm-project:main Feb 18, 2026
57 of 62 checks passed
jasonozuzu-cohere pushed a commit to jasonozuzu-cohere/vllm that referenced this pull request Feb 18, 2026
…-project#34645)

Signed-off-by: Josephasafg <ajgard7@gmail.com>
Signed-off-by: Jason Ozuzu <jasonozuzu@cohere.com>
ZJY0516 pushed a commit to ZJY0516/vllm that referenced this pull request Feb 23, 2026
…-project#34645)

Signed-off-by: Josephasafg <ajgard7@gmail.com>
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
llsj14 pushed a commit to llsj14/vllm that referenced this pull request Mar 1, 2026
tunglinwood pushed a commit to tunglinwood/vllm that referenced this pull request Mar 4, 2026
askliar pushed a commit to askliar/vllm that referenced this pull request Mar 9, 2026
…-project#34645)

Signed-off-by: Josephasafg <ajgard7@gmail.com>
Signed-off-by: Andrii Skliar <askliar@nvidia.com>
Copilot AI pushed a commit to machov/vllm that referenced this pull request Mar 10, 2026
EricccYang pushed a commit to EricccYang/vllm that referenced this pull request Apr 1, 2026
…-project#34645)

Signed-off-by: Josephasafg <ajgard7@gmail.com>
Signed-off-by: EricccYang <yangyang4991@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

quantization ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants